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FlilAL  KKPORT  FOR  GRAHT  AFOSR  76-2951 
Summary 


The  integration  of  feature  extraction  for  pattern  recognition  and  the 
digital  sign  il  processing  into  one  study  as  performed  in  this  project  has 
resulted  in  advances  in  both  areas,  and  the  discovery  of  many  new  ideas 
which  are  beneficial  to  both  areas.  There  are  common  problems,  such  as  the 
finite  sample  size  effect,  in  both  pattern  recognition  and  signal  processing. 
For  example,  digital  signal  processing  techniques  are  much  needed  in  extracting 
effective  features  while  statistical  pattern  recognition  can  be  useful  in 
image  processing.  More  specifically,  this  research  has  carefully  examined 
the  fundamental  problem  of  the  finite  sample  size  and  its  effect  on  feature 
selection  and  classification  rules.  Most  effective  features  for  seismic 
pattern  recognition  have  been  developed  through  the  signal  modelling  study. 

In  the  image  recognition  work,  new  results  Include  the  rotationally  invariant 
digital  Laplacian  operation  and  a new  adaptive  Kalman  filtering  technique 
for  efficient  realtime  image  processing.  Detailed  computer  results  have 
been  developed  and  documented  to  support  the  theoretical  study.  Finally  for 
image  classification,  the  specific  problem  of  contextual  information  is 
examined  and  a decision  tree  procedure  is  developed  which  can  process  both 
the  statistical  and  structural  features  for  effective  classification. _ 
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FINAL  REPORT  FOR  GRANT  AFOSR  76-2951 


I.  Statistical  Feature  Extraction 

The  work  on  the  distance  measures  has  been  completed.  Major  effort 
has  been  made  on  the  study  of  finite  sample  size  in  statistical  pattern  recognition. 
Appendix  I has  a more  complete  detail  on  this  study.  Under  small  sample 
size,  many  existing  theoretical  results  based  on  infinite  sample  assumption 
are  not  valid.  Many  analytical  problems  under  small  sample  size  do  not 
have  a solution  yet.  For  example,  the  performance  of  the  nearest  neighbor 
decision  rule  is  not  available  at  finite  sample  size  except  for  specific 
cases.  These  are  difficult  analytical  problems.  The  experimental 
results  are  very  much  dependent  on  the  data  used,  but  they  do  give  us  an 
idea  of  the  performance.  There  are  certain  decision  rules  which  perform 
better  at  small  sample  size  than  the  others.  And  sometimes  the  degradation 
in  performance  due  to  finite  sample  size  is  not  significant.  So  the  pro- 
blem is  important  and  further  work  is  necessary. 

II.  Seismic  Pattern  Recognition 

Two  best  sets  of  features  for  automatic  seismic  classification  are 
the  short-time  spectral  features,  and  the  parameters  of  the  autoregressive 
moving  average  model.  The  ARMA  provides  a fairly  good  spectral  matching 
to  many  seismic  record.  The  use  of  the  AR  model  alone  however  is  not 
adequate.  It  is  noted  that  learning  samples  should  be  chosen  properly 
as  there  are  large  within-class  variations of  the  seismic  records  due 
to  a number  of  reasons  such  as  different  geographical  locations  for 
various  events.  Although  a new  set  of  seismic  data  tape  was  provided 
by  Seismic  Data  Analysis  Center,  the  time  limitation  of  the  project 
would  not  make  it  possible  to  pursue  such  study. 

Signal  modelling  appears  to  play  an  Increasingly  Important  role. 

This  is  a subject  which  is  important  to  pattern  recognition  and  signal 
processing. 
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III.  Image  Pattern  Recognition 

The  modified  gradient  method,  approximation  to  rotationally- 
invariant  digital  Laplacian  and  an  adaptive  Kalman  filtering  method 
are  the  three  techniques  which  are  theoretically  sound  and  experimentally 
proven  by  using  both  the  aerial  reconnaissance  and  FLIR  imagery  which 
we  have  available.  For  the  adaptive  filtering  method  some  new  results 
are  shown  in  Appendix  II.  The  filter  has  the  capability  to  monitor  the 
object  boundary  and  make  proper  adjustment  in  filter  parameters.  In 
case  the  transition  matrix  is  unknown,  it  can  be  estimated  by  using  an 
on-line  estimation  method  which  simultaneously  estimate  the  parameters 
and  states.  This  filtering  method  thus  requires  little  or  no  prior 
knowledge  to  begtfi^ilnd  the  processing  is  very  fast  and  suitable  for 
realtime  needs. 

The  rich  contextual  Information  in  images  makes  it  necessary  to 
extract  both  statistical  features  and  the  structural  features.  The  best 
way  to  utilize  both  kinds  of  features  in  classification  is  the  binary 
classification  trees  which  process  the  features  sequential ly  according 
to  their  ranking.  The  decision  will  be  based  on  majority  vote.  For 
pre-designed  trees  the  required  decision  time  may  be  a fraction  of  a 
single  stage  classifier.  Optimal  tree  design  technique  is  available. 

We  feel  that  the  sequential  decision  tree  is  very  promising  for  use 
in  complex  recognition  systems. 
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Appendix  I for  Final  Report 


FINITE  SAMPLE  CONSIDERATIONS  IN  STATISTICAL  PATTERN  RECOGNITION 


C.  H.  Chen 


Electrical  Engineering  Department 
Southeastern  Massachusetts  University 
North  Dartmouth,  Massachusetts  027hj 


Abstract 


2.  Finite  Sample  Distance  Measure 


Most  research  on  statistical  pattern 
recognition  has  been  based  on  the  assumption  of 
large/ infinite  sample  size  and  thus  the  asymptotic 
performance  is  of  primary  interest.  In  practical 
recognition  problems  the  sample  size  is  often 
limited  and  the  actual  performance  may  be  quite 
different  from  that  theoretically  predicted.  The 
design  and  evaluation  of  recognition  systems  must 
take  into  account  the  finite  sample  constraint. 
There  has  been  little  considerations  of  the 
finite  sample  effects  in  statistical  pattern 
recognition  because  exact  solutions  are  generally 
unavailable.  The  close  relation  between  the 
dimensionality  and  sample  size  further  complicates 
the  problem.  This  paper  is  concerned  mainly  with 
the  limitation  of  learning  sample  size.  The 
finite  sample  effects  are  considered  in  three 
major  problem  areas:  distance  and  information 
measures,  classification  rules  and  contextual 
analysis. 


1.  Introduction 


A fundamental  assumption  frequently  made  in 
statistical  pattern  recognition  is  that  the  number 
of  samples  available  for  learning  (training)  or 
classification  is  large  or  infinite.  Thus  the 
asymptotic  performance  is  of  primary  interest, 
based  on  which  the  recognition  system  is  designed. 
In  practice  the  sample  size  is  limited  because  the 
samples  are  costly  or  may  not  be  easily  available. 
The  actual  performance  may  become  quite  different 
from  that  theoretically  predicted.  There  has  been 
little  theoretical  considerations  of  the  finite 
sample  effects  in  statistical  pattern  recognition 
because  exact  solutions  are  in  general  unavailable. 
The  close  relation  between  the  dimensionality  and 
sample  size  further  complicates  the  problem.  In 
this  paper,  the  effects  of  finite  sample  size  will 
be  considered  especially  in  the  problem  areas  of 
distance  and  information  measures,  classification 
rules,  and  contextual  analysis  when  the  learning 
sample  size  is  limited.  This  study  will  enhance 
the  understanding  of  the  fundamental  behaviors  of 
statistical  recognition  systems  so  that  better 
systems  can  be  designed. 


Distance  measures  are  useful  for  feature 
selection  and  extraction  and  for  error  bounds  of 
Bayes  error  probability.  They  have  been  exten- 
sively examined  in  recent  years  under  the  assump- 
tion of  large  sample  size  (see  e.g.  [1]).  To 
determine  the  distance  measures  from  a limited 
number  of  samples,  a maximum  likelihood  estimation 
procedure  may  be  used  [2].  The  discussion  here 
will  be  limited  to  independent  Gaussian  samples 
for  divergence  and  Bhattacharyya  distance  but  can 
be  extended  to  other  cases. 


Consider  first  the  case  of  two  univariate 
Gaussian  densities  with  means  m and  m and  the 
same  variance  which  is  known!  Let  denote 
the  quantity  evaluated  by  using  the  sample 
estimates.  Then  the  difference  between  the 
estimated  and  known  divergence  is 

J - J = ^ [(^  - m2)1 2  - (mx  - m2)?l  (l) 

o 

which  has  the  expected  value 

E(J  - J)  = i-  ♦ i-  > 0 (2) 

1 "2 

where  Nj  and  N denote  the  numbers  of  samples  for 
classes  1 and  2 respectively.  The  positive  bias 
given  by  Eq.  (2)  indicates  that  the  divergence 
evaluated  by  using  a finite  number  of  samples  can 
lead  to  an  over  optiministic  estimate  of  the  error 
probability.  The  variance  of  the  estimate  is 

E(J  - J)2  - 3(±-  + j-)2  ♦ kJ(i-  ♦ |—)  (3) 

1 2 1 2 


which  approacheszero  as  the  sample  sizes  approach 
infinite.  Thus  J is  a consistent  estimate  of  J. 


Next  consider  the  univariate  Gaussian  densities 
with  zero  means  and  variances  and  • The 
divergence  based  on  the  sample  estimated  parameters 


J 


(It) 


*2  . " 2 

The  ratio  u « c /u„  has  the  E-distribution  with 
( Ni , Ng)  degrees  or  freedom.  The  expected  error 
due  to  the  finite  sample  size  is 


-2- 


2 2 

E(}  - J)=!!  ir-h-  + !§N-^2i0  (5) 

°2  2 °1  1 

where  the  positive  bias  can  be  significant  for 
small  sample  sizes.  It  can  be  shown  that  Eq.  (b) 
is  also  a consistent  estimate. 

The  Bhattacharyya  distance  based  cn  the  sample 
estimated  parameters  is 

a,  . ,2 

(6) 


I 1 , ,“1  . “2,  1 . (1  + u) 

B = g log(—  + — ) = ^-  log  

°2  °1 

By  using  the  Taylor  series  expansion  of  B with 
respect  to  the  true  value  B,  and  retaining  terms 
up  to  the  second  order  in  the  expression,  we 
obtain: 

2 2 

a - a ? 

E(B  - B)  = 5 \ (1  - ^-§-g) 


b(a2  + a2) 


b „ 2 2 b 

°2  + 2gl°2  ~ ai 
8(a2  + a2}2 


(7) 


b VN1  + 2) 

11  ” N2  - 2 + N1(N2  - 2) CNu  - b)1 

d2 

which  is  negative  for  — — >_\  + /2  and  positive 

otherwise.  As  the  sample  sizes  approach  infinity, 
the  bias  is  not  zero  because  of  the  series 
truncation.  However  the  sample  size  effect  is 
evident  from  Eq.  (7). 

Next  consider  the  multivariate  Gaussian 
densities  for  p-dimensional  measurements.  Let 
and  x2  be  the  sample  mean  vectors  corresponding 
to  the  true  mean  vectors  u and  Ug  of  classes  1 and 
2 respectively.  Also  let  S be  the  sample  estimate 
of  the  common  covariance  matrix  £ given  by 


S = 


Nx  + N2  - 2 


{ l (xx  - XjHXj  - Xj)' 


1=1 


Nl+N2 


(Xj  - XgKXj  - x2)'} 


(8) 


1=^+1 


where  x is  the  vector  measurement  from  either  class 
1 or  class  2. 

For  infinite  sample  size  the  exact  value  of 
the  divergence  is  known  and  given  by 


= (mx  - p2)'  rS  - w2) 


(9) 


which  is  the  same  as  the  Mahalanobis  distance. 
The  divergence  using  sample  estimated  parameters 
is 


J * (*1  - Xg),S_1(x1  - x2) 


(10) 


E[(i 


;i  ‘ “l  " *2  +W2)(*1  “ W1  ' *2  + W2),J 


- z (i-  * i_) 

n2' 

Let  k = . The  random  variable  J/k  has  a 

1 p2 

Hotelling's  T non-null  distribution  (see  e.g  (3)) 
with  + Ng  samples  and  f = N,  + N - 2 degrees 
of  freedom  given  by 


H'(r|f,  f 

p k1  k k 


-J /2k  r (J/2k)r 

L r! 
r=o 


B(f  + r. 


f - 


B-+  l) 

O I 


(J/kf) 


( p/2 )+r-l 


/,  + J_1(l/2)(f+l)+r 
v kf' 


By  using  the  formula 
H- 1 


d t^J.  j > 0. 


(11) 


(1  + x ) * 


dx  = B(u,  v - p) 


the  expectation  of  J can  be  written 
kfp  _ f J 


E(J1  = 


> J 


P - 1 f-p-1 

which  approaches  .J  when  the  sample  sizes  become 
infinity.  Also  we  can  show  that 

E(i  - J)2  - r~  — ruf  -p- 

[J2  + 2Jk( p + 2)  + p(p  + 2)k2)+  J2 

2kfpJ  + 2fJ2 
f-p-1 


(12) 


(13) 


which  approaches  zero  as  the  sample  sizes  approach 
infinity.  Thus  in  the  multivariate  Gaussian  case, 
J is  also  a consistent  estimate  of  J.  For  equal 
covariance  but  unequal  mean  vector.  , the 
Bhattacharyya  distance  experiences  the  same  effect 
as  divergence  as  B * J/8. 

For  two  multivariate  Gaussin  densities  with 

zero  mean  vectors  and  covariance  matrices  7.  and 
r ‘'l 

whose  unbiased  estimates  are  V and  V0 

respectively,  the  divergence  based  on  sample 
estimated  parameters  is 

j “ \ tr^Vg1  ♦ V^1)  - p 


2 1 


(lb) 


Since  the  samples  from  the  two  classes  are 
Independent, 

E[J]  - | tr(E(V1)E(Vg1)  + E(Vg)E(V^)}  - p 

Both  V and  V,1  follow  the  Wishart  distribution 
with  expectations 


where  E[S]  * J.  The  covariance  matrix  x^  - Xg  is  E(Vj)  - E(V^  ) 


—h  .1-1.® 


i 
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Thus 


hU 


♦ 


p ♦ 1 

Nx  - P - 1 


Wi  > 


(15) 


where  the  bias  term  coincides  with  Eq.  (5)  for 
p = 1,  i.e.  the  univariate  case. 


The  above  discussion  clearly  illustrates  the 
effect  of  finite  sample  size  on  the  bias  of  the 
sample  estimated  distance  measures.  In  general 
the  estimated  divergence  has  a positive  bias  while 
the  behavior  of  estimated  Bhattacharyya  distance 
is  less  predictable.  It  is  noted  also  that  direct 
estimation  of  the  distance  measures  is  possible  if 
nonparametric  density  estimate  is  employed;  but  it 
would  be  more  difficult  to  study  the  small  sample 
behavior. 


3.  Finite  Sample  Information  Measures 

For  feature  selection,  more  informative 
features  result  in  low  classification  errors. 
However,  if  the  sample  size  is  limited,  infor- 
mation measures  estimated  from  samples  may  not  be 
as  effective.  Consider  the  equivocation  for  m 
classes  defined  as 


m 

H = - E [ l P(u,/x)log  P(u./x)]  (16) 

i-1 


where  Pvw  /x)  = is  the  a posteriori  probability 
of  the  itn  class  and  the  expectation  is  taken  to 
the  space  of  x. 


The  sample-based  equivocation  using  the 
estimated  a posteriori  probability  Pj  is 

. m . 

H = - E [ l P.  log  P | (17) 

1=1  1 1 


Let  0j  be  the  parameter  of  the  ith  class,  and  0. 
its  estimate.  Assume  that  the  sample  size  effect 
is  small  so  that  we  need  consider  only  the  first 
two  terms  in  the  Taylor  series  expansion  of  P^, 

Pj  * Pj  + PJ(0i  - 0j)  (18) 

where  Pj  is, the  partial  derivative  of  P^  with 
respect  to  0^  evaluated  at  0^  = 0^.  The  difference 
between  the  estimated  and  true  equivocations  can 
be  written  as 


H - H * E l [P! (e  - 0 Hi  + log  P ) 
i-1  1 1 1 

♦ pi(®i  - V2) 


(19) 


which. is  stijl  a function  of  0^.  It  is  noted  that 
both  P,  and  H are  expanded  by  using  the  first  order 
approximations.  If  0 is  an  unbiased  estimate  of 
0,,  then  the  expectation  of  the  difference  with 
respect  to  the  estimated  parameter  depends  only 
on  the  variance  of  0^  which  is  usually  inversely 


proportional  to  the, sample  size.  The  variance  of 
H given  by  E(H  -,E(H))P  where  the  expectations  are 
with  respect  to  is  approximately  proportional  to 
the  variance  of  0 or  inversely  proportional  to  the 
sample  size.  Thus  H given  by  Eq.  ( 17 ) is  an 
asymptotically  unbiased  and  consistent  estimate  of 
H.  To  examine  the  small  sample  behavior,  specific 
expressions  for  P,  and  P are  needed  in  order  to 
evaluate  Eq.  ( 19 ) • 

*>.  Finite  Sample  Discriminant  Analysis 

Although  there  is  an  enormous  statistical 
literature  on  discrimination  in  the  Gaussian  case, 
the  available  small  sample  results  are  few  and 
inconsistent.  The  proposed  effort  will  concentrate 
on  some  special  cases  including  the  class  of 
exponential  densities.  The  linear  discriminant 
function  resulting  from  equal  covariance  matrices 
is  the  most  important  special  ease.  The  common 
covariance  matrix  may  be  determined  from  training 
samples  of  both  classes  (Eq.  8)  which  is  the 
assumption  made  in  many  statistical  literatures. 

The  difference  between  the  error  probabilities  can 
be  approximated  by  truncated  Taylor  series  as 


£ 


i-  e-y2/2 

Gi 


dy 


S3 


2 


(20) 


-(J  - J) 


exp 


(j  - j)‘ 

6U/2vJ 


(1 


U,  J 

J)exp  ' 8 


The  expected  value  of  the  difference  can  be 
obtained  by  using  Eqs.  (12)  and  ( 1 3 ) - A sample 
calculation  for  = 5,  P = 2 and  J - l gives 

the  expected  difference  - 0.0027  which  is  close 
to  the  expected  difference  of  - 0.0312  by  using  an 
expression  due  to  McLachlan  (M  which  is  computed 
in  [5).  If  J increases,  the  sample  sizes  must  also 
increase  to  maintain  a good  approximation  in 
Eq.  (20).  Similar  analysis  of  error  probability 
for  other  discriminant  functions  using  estimated 
parameters  may  not  be  available  however.  In  most 
cases  computer  simulation  is  necessary  to  determine 
the  relations  among  performance,  sample  sizes  and 
dimensionality.  A good  example  is  the  quadratic 
discriminant  function  for  unequal  covariance 
matrices  [6],  Unfortunately  consistent  results 
have  not  been  reported  in  the  literature) 5) [7 ) 
other  than  the  linear  discriminant  function  dis- 
cussed above.  In  some  cases  including  the 
exponential  densities,  good  theoretical  approxi- 
mation of  error  probability  under  finite  sample 
size  is  possible.  The  individual  cases  must  be 
examined  separately  and  computer  simulation  must  be 
used  when  necessary.  There  does  not  appear  to  have 
a unique  solution  procedure  suitable  for  all  cases. 

5.  Finite  Sample  Nearest-Neighbor  Decision  Buies 

The  nearest-neighbor  decision  rule  (NNDB)  is 
attractive  in  the  sense  that  the  NN-risk  is  upper 
bounded  by  twice  the  Bayes  risk  when  the  sample 
size  approaches  infinity.  For  a given  sample  size 
the  1-NNDR  Is  uniformly  better  than  the  k-NNPR. 
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The  small  (finite)  sample  NNDR  is  important  because 
the  data  storage  and  computational  requirements  can 
easily  be  met  when  the  sample  size  is  small.  Also 
in  realtime  processing,  the  number  of  samples  that 
can  be  processed  at  a given  time  must  be  limited. 
However  the  small  sample  behavior  of  the  NNDH  is 
very  much  unknown  and  much  study  is  needed.  So 
far  only  the  following  restrictive  cases  have  been 
considered:  Fix  and  Hodges  [8]  investigated  the 
small  sample  performance  of  1-NNDR  for  univariate 
and  bivariate  Gaussian  distributions.  Kanal , et.al. 
[9]  derived  the  NNDR  error  probability  for  binary 
patterns.  Levine,  et.al.  (10)  showed  that  the 
performance  for  small  sample  sets  from  uniform 
distributions  is  close  to  its  asymptotic  value. 

For  multivariate  Gaussian  densities  and  allowing 
the  sample  size  to  increase  with  k,  the  number 
of  nearest-neighbors,  it  is  shown  (5)  numerically 
that  the  k-NNDR  has  a performance  very  close  to 
the  Bayes  linear  discriminant  analysis.  This 
indicates  that  under  medium  or  large  sample 
condition,  the  NNDR  is  comparable  to  the  Bayes 
rule  using  the  estimated  parameters. 


limitation  in  spatial  resolution  and  quantization 
levels.  In  image  interpretation  and  classification 
study,  an  image  is  usually  partitioned  into  a 
number  of  subimages.  A vector  measurement  may  be 
taken  from  each  subimage.  By  assuming  dependence 
on  the  nearest  four  neighboring  subimages,  the 
compound  decision  rule  is  to  choose  the  class  which 
maximizes  [l6][17], 

It 

p(xoS)p(“k)  ^ p(yv 


(22) 


where  o = 1,  2,...,m  and  x is  the  measurement  of 
the  subimage  under  consideration.  Notice  that  the 
part  of  the  expression  outside  of  the  product  sign 
is  identical  to  that  used  in  a simple  maximum 
likelihood  decision  rule  without  considering 
neighboring  subimages  at  all.  Each  multiplier 
in  the  product  term  represents  the  contextual 
contribution  from  an  adjacent  neighboring  subimage. 
The  probability  densities  required  for  evaluating 
Eq.  (22)  must  be  either  assumed  or  determined  from 
the  gray  level  histogram. 


For  1-NNDR,  the  conditional  error  probability 
given  the  measurement  x and  its  nearest-neighbor 
Xj  is  [11] 

r(x,  Xj)  = P(u)1/x)P(u)2/x^ ) ♦ P(u»2/x)P(u)1/xJ)  (2l) 

Now  the  usual  assumption  that  P(u)^/x.)  approaches 
P(uj./x)  asumptotically  does  not  holdJin  the  small 
sample  case.  For  a given  parametric  or  non- 
parametric  density,  the  NN-risk  for  small  sample 
size  can  be  obtained  by  taking  the  expectations 
of  Eq.  (21)  with  respect  to  x and  x.  Similar 
expressions  can  be  written  fo*  k-NNDR.  As  the 
closed  form  expressions  are  generally  not  avail- 
able for  the  expectations  involved,  tight  bounds 
must  be  established. 

It  should  be  noted  here  that  the  small  sample 
NNDR  behavior  examined  here  is  a different  problem 
from  the  edited  or  condensed  NNDR  considered  else- 
where (see  e.g.  [12] [11]).  However  the  idea  of 
using  a small  set  of  selected  learning  samples  is 
important.  Our  experimental  results  with  the 
teleseismic  data  [ Ll*  1 have  shown  that  there  is 
always  a 3mall  subset  of  good  learning  samples 
that  dominate  the  per formance.  In  other  words  the 
performance  would  be  insensitive  to  sample  size  for 
good  quality  learning  samples.  Thus  the  small 
sample  NNDR  performance  need  not  be  worse  than  the 
asymptotic  performance  by  properly  selecting  a small 
set  of  learning  samples. 

Contextual  Analysis  for  Image  Recognition 

A major  weakness  of  statistical  pattern 
recognition  Is  the  difficulty  to  take  the  contextual 
relations  into  account  in  the  recognition  process. 
Character  recognition  is  not  considered  here  as  it 
requires  a somewhat  difference  contextual  analysis 
(151*  An  imagery  pattern  is  rich  in  contextual 
inf  rmation  part  of  which  is  statistical  in  nature. 

A formal  statistical  approach  to  this  problem  is  the 
compound  decision  theory.  The  finite  sample  con- 
straint in  digital  imagery  patterns  is  caused  by  the 
limited  number  of  image  samples  available  and  the 


If  we  assume  a Gaussian  density  for  the 
measurement  x,  then  the  finite  sample  discriminant 
analysis  is  useful  for  subimage  classification  when 
the  contextual  dependence  is  not  considered.  If 
the  contextual  information  is  taken  into  account, 
then  the  product  terms  in  Eq.  (22)  produce 
additional  terms  in  the  discriminant  function  caus- 
ing some  complexity  in  error  probability  computa- 
tion. However,  the  effect  of  estimated  parameters 
based  on  finite  number  of  image  samples  can  still 
be  determined  under  Gaussian  assumption.  If  both 
the  finite  learning  sample  and  the  quantization 
and  spatial  resolution  constraints  are  considered, 
the  direct  use  of  histogram  would  be  more  suitable. 
Let  the  images  of  interest  consist  of  objects  on  a 
background  with  probability  densities  p(z)  and  q(z) 
respectively  where  z denotes  the  gray  level. 

Suppose  further  that  th*»  objects  occupy  fraction 
0 of  the  image  area,  so  that  the  background 
occupies  fraction  1-0.  Then  the  normalized 
histogram  of  the  image  is  the  overall  gray  level 
probability  density  0p(z)  + (l-6)q(z).  The 
thresholding  technique  [ 1 fl ] can  then  be  used  for 
each  subimage  to  decide  for  each  pixel  (picture 
element)  whether  it  belongs  to  the  objects  or 
background.  Eq.  (22)  can  be  considered  as 
representing  an  object  or  background  histogram 
obtained  from  the  object  or  background  pixels  of 
all  five  subimages.  A minimum  error  decision 
threshold  can  be  obtained  from  the  two  histograms 
p(z)  and  q(z).  If  the  subimages  under  consider- 
ation has  more  pixels  below  the  threshold  then  the 
decision  is  in  favor  of  the  objects,  otherwise  the 
decision  will  be  the  background. 

The  above  procedure  makes  it  easy  not  only  to 
implement  the  compound  decision  rule  given  by 
Eq.  (22)  but  also  tc  determine  the  finite  sample 
effects.  The  four  neighboring  subimages  obviously 
increases  the  effective  total  number  of  pixels 
used  for  classification.  If  the  object  and  back- 
ground histograms  are  modelled  as  Gaussian 
densities  then  the  error  probability  of  the  compound 
Bayes  decision  rule  can  be  determined  from  the 
Gaussian  models  using  estimated  parameters.  A more 
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general  approach  is  to  use  the  sampling  distri- 
bution of  the  histogram  for  c quantization  levels 
and  a total  of  n pixels  for  the  image  given  by 


f ( n ♦ q ) 

r ( r , + l)...r(r  ♦ 1) 


(23) 


where  r.  is  the  number  of  pixels  belonging  to  the 
ith  quantization  level.  The  Bayes  estimate  of  p. , 
the  fractional  number  of  pixels  for  the  ith  level 
is 


1 

Pi  = ~n"T"q 

Then  it  is  possible  to  determine  the  mean  recogni- 
tion accuracy  [19]  taking  into  account  the  con- 
textual information. 


7.  Remarks 


Although  it  would  be  desirable  to  have  sub- 
stantial sample  size  for  all  pattern  recognition 
problems  considered,  limitation  in  sample  size 
frequently  occurs  in  practice.  Except  for  the 
uninteresting  case  of  too  small  sample  size,  the 
recognition  performance  which  depends  on  both 
sample  size  and  dimensionality  need  not  be  poor 
at  small  sample  size.  In  designing  recognition 
systems  which  operate  at  small  learning  sample 
size,  classification  algorithms  which  are  less 
sensitive  to  sample  size  should  be  preferred. 
Unfortunately  no  single  method  can  be  used  to 
examine  the  finite  sample  effects  in  all  problems 
considered.  The  solutions  must  be  problem 
dejjendent.  Series  expansion  and  tight  error 
bounds  should  be  used  if  exact  solutions  are  not 
available.  Distinction  among  small  sample  size, 
medium  sample  size,  and  large  sample  6ize  should 
also  be  made  in  each  problem  area. 
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